AITopics | Fact Book

Collaborating Authors

Fact Book

Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts

Gupta, Raavi, Panicker, Pranav Hari, Bhatia, Sumit, Ramakrishnan, Ganesh

arXiv.org Artificial IntelligenceNov-18-2025

Large language models (LLMs), despite their remarkable text generation capabilities, often hallucinate and generate text that is factually incorrect and not grounded in real-world knowledge. This poses serious risks in domains like healthcare, finance, and customer support. A typical way to use LLMs is via the APIs provided by LLM vendors where there is no access to model weights or options to fine-tune the model. Existing methods to detect hallucinations in such settings where the model access is restricted or constrained by resources typically require making multiple LLM API calls, increasing latency and API cost. We introduce CONFACTCHECK, an efficient hallucination detection approach that does not leverage any external knowledge base and works on the simple intuition that responses to factual probes within the generated text should be consistent within a single LLM and across different LLMs. Rigorous empirical evaluation on multiple datasets that cover both the generation of factual texts and the open generation shows that CONFACTCHECK can detect hallucinated facts efficiently using fewer resources and achieves higher accuracy scores compared to existing baselines that operate under similar conditions. Our code is available here.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.12236

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview > Fact Book (0.43)

Industry:

Leisure & Entertainment > Sports (0.47)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

A critique of pure stupidity: understanding Trump 2.0

The GuardianOct-2-2025, 04:00:37 GMT

President Donald Trump holds charts as he speaks about the economy in the Oval Office, August 2025. President Donald Trump holds charts as he speaks about the economy in the Oval Office, August 2025. If the first term of Donald Trump provoked anxiety over the fate of objective knowledge, the second has led to claims we live in a world-historical age of stupid, accelerated by big tech. But might there be a way out? T he first and second Trump administrations have provoked markedly different critical reactions. The shock of 2016 and its aftermath saw a wave of liberal anxiety about the fate of objective knowledge, not only in the US but also in Britain, where the Brexit referendum that year had been won by a campaign that misrepresented key facts and figures.

arendt, judgment, stupidity, (15 more...)

The Guardian

Country:

Europe > United Kingdom (0.69)
Oceania > Australia (0.04)
North America > United States > Utah (0.04)
(5 more...)

Genre: Overview > Fact Book (0.54)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (0.95)
Information Technology > Artificial Intelligence (0.95)

Add feedback

Aligning LLMs for the Classroom with Knowledge-Based Retrieval -- A Comparative RAG Study

Jain, Amay, Cui, Liu, Chen, Si

arXiv.org Artificial IntelligenceSep-10-2025

Large language models like ChatGPT are increasingly used in classrooms, but they often provide outdated or fabricated information that can mislead students. Retrieval Augmented Generation (RAG) improves reliability of LLMs by grounding responses in external resources. We investigate two accessible RAG paradigms, vector-based retrieval and graph-based retrieval to identify best practices for classroom question answering (QA). Existing comparative studies fail to account for pedagogical factors such as educational disciplines, question types, and practical deployment costs. Using a novel dataset, EduScopeQA, of 3,176 questions across academic subjects, we measure performance on various educational query types, from specific facts to broad thematic discussions. We also evaluate system alignment with a dataset of systematically altered textbooks that contradict the LLM's latent knowledge. We find that OpenAI Vector Search RAG (representing vector-based RAG) performs well as a low-cost generalist, especially for quick fact retrieval. On the other hand, GraphRAG Global excels at providing pedagogically rich answers to thematic queries, and GraphRAG Local achieves the highest accuracy with the dense, altered textbooks when corpus integrity is critical. Accounting for the 10-20x higher resource usage of GraphRAG (representing graph-based RAG), we show that a dynamic branching framework that routes queries to the optimal retrieval method boosts fidelity and efficiency. These insights provide actionable guidelines for educators and system designers to integrate RAG-augmented LLMs into learning environments effectively.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.07846

Country: North America > United States > Pennsylvania (0.14)

Genre:

Research Report (1.00)
Overview > Fact Book (0.34)

Industry: Education > Educational Setting > K-12 Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

Nguyen, Tin, Bolton, Logan, Taesiri, Mohammad Reza, Nguyen, Anh Totti

arXiv.org Artificial IntelligenceMar-4-2025

An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.

accuracy, highlighted chain, reformatted question, (16 more...)

arXiv.org Artificial Intelligence

2503.02003

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(22 more...)

Genre:

Research Report (1.00)
Overview > Fact Book (0.34)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation

Vu, Minh Duc, Chen, Jieshan, Xing, Zhenchang, Lu, Qinghua, Xu, Xiwei, Fu, Qian

arXiv.org Artificial IntelligenceFeb-25-2025

With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate narratives that fully capture the semantics of the dataset or align the fact sheet with specific user needs. Addressing these shortcomings, this paper introduces \tool, a novel tool designed for the automatic generation and customisation of fact sheets. \tool applies the concept of collaborative AI workers to transform raw tabular dataset into comprehensive, visually compelling fact sheets. We define effective taxonomy to profile AI worker for specialised tasks. Furthermore, \tool empowers users to refine these fact sheets through intuitive natural language commands, ensuring the final outputs align closely with individual preferences and requirements. Our user evaluation with 18 participants confirms that \tool not only surpasses state-of-the-art baselines in automated fact sheet production but also provides a positive user experience during customization tasks.

dataset, fact sheet, factflow, (15 more...)

arXiv.org Artificial Intelligence

2502.17909

Country:

Europe > France (0.04)
North America > United States (0.04)
Europe > Italy (0.04)
(3 more...)

Genre: Overview > Fact Book (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Media > Film (0.68)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science (1.00)
(5 more...)

Add feedback

STRUX: An LLM for Decision-Making with Structured Explanations

Lu, Yiming, Hu, Yebowen, Foroosh, Hassan, Jin, Wei, Liu, Fei

arXiv.org Artificial IntelligenceOct-16-2024

Countless decisions shape our daily lives, and it is paramount to understand the how and why behind these choices. In this paper, we introduce a new LLM decision-making framework called STRUX, which enhances LLM decision-making by providing structured explanations. These include favorable and adverse facts related to the decision, along with their respective strengths. STRUX begins by distilling lengthy information into a concise table of key facts. It then employs a series of self-reflection steps to determine which of these facts are pivotal, categorizing them as either favorable or adverse in relation to a specific decision. Lastly, we fine-tune an LLM to identify and prioritize these key facts to optimize decision-making. STRUX has been evaluated on the challenging task of forecasting stock investment decisions based on earnings call transcripts and demonstrated superior performance against strong baselines. It enhances decision transparency by allowing users to understand the impact of different factors, representing a meaningful step towards practical decision-making with LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.12583

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(3 more...)

Genre:

Financial News (1.00)
Overview > Fact Book (0.55)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment

Li, Haitao, Ai, Qingyao, Han, Xinyan, Chen, Jia, Dong, Qian, Liu, Yiqun, Chen, Chong, Tian, Qi

arXiv.org Artificial IntelligenceMar-27-2024

Recent research demonstrates the effectiveness of using pre-trained language models for legal case retrieval. Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity. However, in the legal domain, textual semantic similarity does not always imply that the cases are relevant enough. Instead, relevance in legal cases primarily depends on the similarity of key facts that impact the final judgment. Without proper treatments, the discriminative ability of learned representations could be limited since legal cases are lengthy and contain numerous non-key facts. To this end, we introduce DELTA, a discriminative model designed for legal case retrieval. The basic idea involves pinpointing key facts in legal cases and pulling the contextualized embedding of the [CLS] token closer to the key facts while pushing away from the non-key facts, which can warm up the case embedding space in an unsupervised manner. To be specific, this study brings the word alignment mechanism to the contextual masked auto-encoder. First, we leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Second, we employ the deep decoder to enable translation between different structures, with the goal of pinpointing key facts to enhance discriminative ability. Comprehensive experiments conducted on publicly available legal benchmarks show that our approach can outperform existing state-of-the-art methods in legal case retrieval. It provides a new perspective on the in-depth understanding and processing of legal case documents.

delta, key fact, retrieval, (11 more...)

arXiv.org Artificial Intelligence

2403.18435

Country:

Asia > China (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Overview > Fact Book (1.00)
Research Report > New Finding (0.88)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)

Add feedback

Multi-Query Focused Disaster Summarization via Instruction-Based Prompting

Seeberger, Philipp, Riedhammer, Korbinian

arXiv.org Artificial IntelligenceFeb-14-2024

Automatic summarization of mass-emergency events plays a critical role in disaster management. The second edition of CrisisFACTS aims to advance disaster summarization based on multi-stream fact-finding with a focus on web sources such as Twitter, Reddit, Facebook, and Webnews. Here, participants are asked to develop systems that can extract key facts from several disaster-related events, which ultimately serve as a summary. This paper describes our method to tackle this challenging task. We follow previous work and propose to use a combination of retrieval, reranking, and an embarrassingly simple instruction-following summarization. The two-stage retrieval pipeline relies on BM25 and MonoT5, while the summarizer module is based on the open-source Large Language Model (LLM) LLaMA-13b. For summarization, we explore a Question Answering (QA)-motivated prompting approach and find the evidence useful for extracting query-relevant facts. The automatic metrics and human evaluation show strong results but also highlight the gap between open-source and proprietary systems.

event nugget, proceedings, summarization, (10 more...)

arXiv.org Artificial Intelligence

2402.09008

Country:

Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)
Europe > Germany > Hesse > Darmstadt Region > Wiesbaden (0.05)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
(6 more...)

Genre: Overview > Fact Book (0.34)

Industry: Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

Ramesh, Gowtham, Sreedhar, Makesh, Hu, Junjie

arXiv.org Artificial IntelligenceJul-1-2023

Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph}} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters.

information retrieval, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2307.00335

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > France (0.04)
(5 more...)

Genre:

Research Report (0.50)
Overview > Fact Book (0.34)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

Sun, Jiashuo, Zhang, Hang, Lin, Chen, Gong, Yeyun, Guo, Jian, Duan, Nan

arXiv.org Artificial IntelligenceFeb-13-2023

Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2212.07249

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre:

Research Report (0.40)
Overview > Fact Book (0.34)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback